低光环境对强大的无人驾驶汽车(UAV)跟踪也构成了巨大的挑战,即使使用最新的(SOTA)跟踪器,由于潜在的图像特征在不利的光条件下很难提取。此外,由于可见性较低,人类监视器的准确在线选择也极为难以在地面控制站中初始化无人机跟踪。为了解决这些问题,这项工作提出了一个新颖的增强剂,即凸线网,以点燃人类操作员和无人机跟踪器的潜在对象。通过采用变压器,LightlightNet可以根据全局特征调整增强参数,因此可以适应照明变化。引入了像素级范围掩模,以使光明网络更加专注于没有光源的跟踪对象和区域的增强。此外,建立了一种软截断机制,以防止背景噪声被误认为关键特征。对图像增强基准测试的评估表明,光明网络在促进人类感知方面具有优势。公共Uavdark135基准进行的实验表明,HightlightNet比其他SOTA低光增强剂更适合无人机跟踪任务。此外,在典型的无人机平台上进行的现实世界测试验证了HightlightNet在夜间航空跟踪相关应用中的实用性和效率。代码和演示视频可在https://github.com/vision4robotics/highlightnet上找到。
translated by 谷歌翻译
在基于文本的分类器中测试公平性问题的一种常见方法是通过使用反事实来:如果更改输入中的敏感属性,则分类器输出是否会更改?现有的反事实生成方法通常依赖于单词列表或模板,产生不考虑语法,上下文或微妙敏感属性引用的简单反事实,并且可能会错过WordList创建者未考虑的问题。在本文中,我们介绍了一项为克服这些缺点而产生的反事实的任务,并证明了如何利用大型语言模型(LLM)来在此任务上取得进展。我们表明,这种基于LLM的方法可以产生现有方法无法实现的复杂反事实,从而比较了民事评论数据集中各种反事实生成方法的性能,并在评估毒性分类器时显示出它们的价值。
translated by 谷歌翻译
指定的实体识别任务是信息提取的核心任务之一。单词歧义和单词缩写是命名实体低识别率的重要原因。在本文中,我们提出了一种名为“实体识别模型WCL-BBCD”(与Bert-Bilstm-Crf-Dbpedia的单词对比学习),结合了对比度学习的概念。该模型首先在文本中训练句子对,计算句子对通过余弦的相似性中的单词对之间的相似性,以及通过相似性通过相似性来命名实体识别任务的BERT模型,以减轻单词歧义。然后,将微调的BERT模型与Bilstm-CRF模型相结合,以执行指定的实体识别任务。最后,将识别结果与先验知识(例如知识图)结合使用,以减轻单词缩写引起的低速问题的识别。实验结果表明,我们的模型在Conll-2003英语数据集和Ontonotes V5英语数据集上优于其他类似的模型方法。
translated by 谷歌翻译
通过各种面部操作技术产生,由于安全问题,面部伪造检测引起了不断的关注。以前的作品总是根据交叉熵损失将面部伪造检测作为分类问题,这强调了类别级别差异,而不是真实和假面之间的基本差异,限制了看不见的域中的模型概括。为了解决这个问题,我们提出了一种新颖的面部伪造检测框架,名为双重对比学习(DCL),其特殊地构建了正负配对数据,并在不同粒度下进行了设计的对比学习,以学习广义特征表示。具体地,结合硬样品选择策略,首先提出通过特别构造实例对来促进与之相关的鉴别特征学习的任务相关的对比学习策略。此外,为了进一步探索基本的差异,引入内部内部对比学习(INL-ICL),以通过构建内部实例构建局部区域对来关注伪造的面中普遍存在的局部内容不一致。在若干数据集上的广泛实验和可视化证明了我们对最先进的竞争对手的方法的概括。
translated by 谷歌翻译
近年来,可微弱的建筑搜索(飞镖)已经受到了大量的关注,主要是因为它通过重量分享和连续放松来显着降低计算成本。然而,更近期的作品发现现有的可分辨率NAS技术难以俯视幼稚基线,产生劣化架构作为搜索所需。本文通过将体系结构权重放入高斯分布,而不是直接优化架构参数,而不是直接优化架构参数,而是作为分布学习问题。通过利用自然梯度变分推理(NGVI),可以基于现有的码票来容易地优化架构分布而不会产生更多内存和计算消耗。我们展示了贝叶斯原则的可分解NAS如何益处,提高勘探和提高稳定性。 NAS-BENCH-201和NAS-BENCH-1SHOT1基准数据集的实验结果证实了所提出的框架可以制造的重要改进。此外,我们还在学习参数上只需简单地应用argmax,我们进一步利用了NAS中最近提出的无培训代理,从优化分布中汲取的组架构中选择最佳架构,从而实现最终的架构-ART在NAS-BENCH-201和NAS-BENCH-1SHOT1基准上的结果。我们在飞镖搜索空间中的最佳架构也会分别获得2.37 \%,15.72 \%和24.2 \%的竞争性测试错误,分别在Cifar-10,CiFar-100和Imagenet数据集上。
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译
Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.
translated by 谷歌翻译
Nowadays, time-stamped web documents related to a general news query floods spread throughout the Internet, and timeline summarization targets concisely summarizing the evolution trajectory of events along the timeline. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order. To tackle this challenge, in this paper, we propose a Unified Timeline Summarizer (UTS) that can generate abstractive and extractive timeline summaries in time order. Concretely, in the encoder part, we propose a graph-based event encoder that relates multiple events according to their content dependency and learns a global representation of each event. In the decoder part, to ensure the chronological order of the abstractive summary, we propose to extract the feature of event-level attention in its generation process with sequential information remained and use it to simulate the evolutionary attention of the ground truth summary. The event-level attention can also be used to assist in extracting summary, where the extracted summary also comes in time sequence. We augment the previous Chinese large-scale timeline summarization dataset and collect a new English timeline dataset. Extensive experiments conducted on these datasets and on the out-of-domain Timeline 17 dataset show that UTS achieves state-of-the-art performance in terms of both automatic and human evaluations.
translated by 谷歌翻译